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Abstract 

This  paper  addresses  speaker  verification  domain  adaptation 
with  inadequate  in-domain  data.  Specifically,  we  explore  the 
cases  where  in-domain  data  sets  do  not  include  speaker  labels, 
contain  speakers  with  few  samples,  or  contain  speakers  with 
low  channel  diversity.  Existing  domain  adaptation  methods  are 
reviewed,  and  their  shortcomings  are  discussed.  We  derive  an 
unsupervised  version  of  fully  Bayesian  adaptation  which  re¬ 
duces  the  reliance  on  rich  in-domain  data.  When  applied  to 
domain  adaptation  with  inadequate  in-domain  data,  the  pro¬ 
posed  approach  yields  competitive  results  when  the  samples  per 
speaker  are  reduced,  and  outperforms  existing  supervised  meth¬ 
ods  when  the  channel  diversity  is  low,  even  without  requiring 
speaker  labels.  These  results  are  validated  on  the  SRE16,  which 
uses  a  highly  inadequate  in-domain  data  set. 

Index  Terms:  speaker  verification,  unsupervised  domain  adap¬ 
tation.  Bayesian  adaptation. 

1.  Introduction 

In  recent  years,  i-vectors  have  become  the  dominant  represen¬ 
tation  of  speech  signals  for  speaker  verification,  since  they  allow 
the  mapping  of  utterances  of  arbitrary  duration  to  a  single  low¬ 
dimensional  vector  [1],  Due  to  their  low  dimensionality,  sophis¬ 
ticated  techniques  can  be  used  to  model  i-vectors  and  generate 
verification  scores.  One  such  scoring  method  is  probabilistic 
linear  discriminant  analysis  (PLDA)  [2],  which  provides  a  sta¬ 
tistical  tool  for  emphasizing  speaker  information  while  compen¬ 
sating  for  undesired  sources  of  variability. 

Statistical  approaches  to  i-vector  speaker  verification  often 
require  explicit  modeling  of  across-class  and  within-class  vari¬ 
abilities.  Commonly,  these  sources  of  variability  are  modeled  as 
Gaussian  distributions,  and  their  means  and  covariance  matrices 
are  trained  on  large  sets  of  speech  data.  Typically,  such  data  sets 
may  include  tens  of  thousands  of  utterances,  with  thousands  of 
individual  labeled  speakers. 

It  has  been  widely  shown  that  the  performance  of  speaker 
verification  systems  degrades  when  facing  unseen  types  of  data 
[3.  4,  5,  6,  7],  This  is  caused  in  part  by  a  mismatch  between  the 
out-of-domain  data  used  to  train  system  hyperparameters,  and 
the  in-domain  data  encountered  during  enrollment  and  testing. 
However,  such  performance  degradation  can  be  mitigated  via 
domain  adaptation  of  system  parameters  using  in-domain  devel¬ 
opment  data.  Typically,  in-domain  sets  are  inadequate  in  some 
respect,  and  we  focus  on  three  such  properties.  First,  speaker 
labels  may  not  be  included,  making  supervised  approaches  un¬ 
usable.  Secondly,  the  data  may  include  speakers  with  few  sam- 
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pies.  Finally,  the  data  may  include  speakers  with  low  chan¬ 
nel  diversity,  where  samples  may  exhibit  similar  channel  types. 
In  the  case  of  inadequate  in-domain  data,  thoughtful  strategies 
must  be  employed  to  successfully  leverage  this  data. 

In  [3],  the  authors  treat  across-class  and  within-class  co- 
variance  matrices  as  random  variables,  and  propose  using  max¬ 
imum  a  posteriori  (MAP)  estimates  of  these  parameters  con¬ 
ditioned  on  the  in-domain  data.  In  this  work,  point  estimate 
approximations  are  used  for  speaker  means  for  the  sake  of  com¬ 
putational  efficiency.  However,  such  methods  may  suffer  if  the 
in-domain  set  includes  individual  speakers  with  few  samples 
or  low  channel  diversity.  In  [5],  the  authors  present  a  fully 
Bayesian  framework  to  domain  adaptation.  This  approach  ex¬ 
plicitly  models  the  uncertainty  due  to  speakers  with  few  sam¬ 
ples,  and  is  therefore  less  sensitive  to  such  data.  However,  it 
still  does  not  address  the  low  channel  diversity  problem. 

In  this  paper,  we  derive  an  unsupervised  version  of  fully 
Bayesian  domain  adaptation.  We  assume  in-domain  data  sam¬ 
ples  to  be  independent,  which  implies  that  each  sample  was  pro¬ 
duced  by  a  unique  speaker,  leading  to  a  reduced  reliance  on  a 
rich  in-domain  data  set.  It  follows  that  the  proposed  method 
does  not  require  speaker  labels  for  the  in-domain  data.  When 
applied  to  domain  adaptation  with  inadequate  in-domain  data, 
the  proposed  technique  provides  competitive  results  for  data 
with  few  samples  per  speaker,  and  outperforms  existing  super¬ 
vised  methods  for  data  with  low  channel  diversity. 

This  paper  is  organized  as  follows.  In  Sec.  2,  we  present 
a  statistical  framework  for  i-vector  domain  adaptation.  Sec.  3 
includes  a  discussion  of  existing  techniques  for  supervised  do¬ 
main  adaptation.  In  Sec.  4  we  derive  unsupervised  Bayesian 
adaptation.  Experimental  results  are  presented  in  Sec.  5,  and 
conclusions  are  provided  in  Sec.  6. 

2.  Statistical  Framework 

In  this  paper,  we  assume  the  additive  noise  model  for  i- 
vectors: 

X-mn  Ym  T"  Cmn,  (1) 

where  xmn  denotes  the  nth  sample  from  the  mth  speaker,  ym 
is  the  latent  speaker  component,  and  c mn  is  the  channel  com¬ 
ponent.  Speaker  components  are  assumed  Gaussian  i.i.d.: 

p(ym)  =  Af  (ym;/it,  Sa) ,  (2) 

and  are  collectively  denoted  by  X  Channel  components  are 
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assumed  Gaussian  i.i.d.: 

P  (c  mn  )=Aa  Cmn  5  0,SW).  (3) 

Domain  adaptation  involves  using  a  limited  set  of  in¬ 
domain  data  to  adapt  hyperparameters  trained  in  a  more 
resource-rich  domain.  Specifically,  in  the  context  of  i-vector- 
based  speaker  verification,  domain  adaptation  refers  to  es¬ 
timation  of  the  set  {/r.  Sa,S„}  based  on  a  set  of  in¬ 
domain  samples,  X,  and  out-of-domain  hyperparameter  esti¬ 
mates  {fiout,  S““f,  ££,“*}.  Let  X  contain  Nt  samples  from 
M  speakers.  We  seek  a  probabilistic  solution  to  domain  adap¬ 
tation,  and  so  we  encode  knowledge  of  the  out-of-domain  data 
in  prior  distributions,  which  can  be  designed  to  reflect  the  out- 
of-domain  parameter  estimates.  We  model  Xa  and  £„  with 
inverse-Wishart  distributions,  as: 

p(S0)  =  IW(Sa;^Er‘,I/0),  (4) 

and: 

(5) 

In  this  way,  £“"*  and  X°ut  approximate  the  modes  of  the  re¬ 
spective  prior  distributions,  and  uaand  vw  reflect  the  certainty 
of  the  priors,  as  discussed  in  [3], 

3.  Supervised  Domain  Adaptation 

3.1.  Parameter  Estimation 

A  popular  approach  for  the  estimation  of  across-class  and 
within-class  covariance  matrices  is  to  use  point  estimates  for 
speaker  components: 


the  estimate  from  (9)  will  not  approach  the  underlying  speaker 
component  regardless  of  the  value  of  Nm. 

3.2.  Adaptation  Using  Speaker  Point  Estimates 

In  [3],  the  authors  find  approximated  MAP  estimates  of  the  in¬ 
domain  parameters  assuming  (6): 

M 

£a  =  2r  ’V]  Nm  (ym  -  A)  (ym  -  A)T  (10) 

1\T  a 

m=  1 

+  (1  -  a)  £°ut 

+  a  (1  —  a)  (£-aO  (A  -  M°“4)T  , 

M  Nm 

Xu>  —  ^  y  (xmn  Ym)  {X-mn  Ym)  (11) 

At  1 

m— 1 n= 1 

+  (i-a)  sr. 

where  a  is  a  function  of  the  number  of  samples  in  the  in-domain 
set,  and  approaches  1  as  the  data  set  grows.  However,  a  can 
also  be  tuned  manually  to  control  the  emphasis  placed  on  the 
in-domain  data  during  adaptation.  The  technique  was  applied  to 
the  2013  domain  adaptation  challenge  (DAC13)  [8],  where  the 
in-domain  set  was  limited  with  respect  to  the  number  of  speak¬ 
ers,  and  showed  graceful  degradation  as  the  speaker  diversity 
was  made  increasingly  scarce.  However,  such  domain  adapta¬ 
tion  methods  can  suffer  for  in-domain  sets  containing  limited 
numbers  of  samples  per  speaker,  or  containing  speakers  with 
low  channel  diversity. 


i 

1'm  | 

n=  1 


(6) 


where  Nm  denotes  the  number  of  samples  provided  by  the  mth 
speaker.  Maximum  likelihood  parameter  estimation  leads  to: 


1 

=  At  ^  (ym  —  A)  (y m  —  A) 


(V) 


m= 1 
M  N, 


^  ^  ]  (xmn  ym)  (xmn  ym)  ,  (8) 


where  A  is  the  sample  mean  of  X .  Note  that  these  equations 
can  be  manipulated  slightly  to  normalize  the  effect  of  individual 
speakers.  It  can  be  observed  from  (6)-(8)  that  modeling  speaker 
components  plays  a  vital  role  in  parameter  estimation,  and  inac¬ 
curate  speaker  component  estimates  may  lead  to  poor  estimates 
for  Xa  and  £,„. 

The  approximation  in  (6)  is  valid  when  each  speaker  pro¬ 
vides  a  large  number  of  samples  and  includes  rich  channel  di¬ 
versity,  but  may  otherwise  result  in  inaccurate  estimates.  If  (1) 
is  substituted  into  (6),  the  speaker  estimate  can  be  expressed  as: 


ym  =  y  m  + 


1 

Nm 


E< 


(9) 


If  channel  components  are  truly  distributed  according  to  (3),  it  is 
clear  from  (9)  that  the  speaker  component  estimate  is  unbiased 
with  a  variance  of  yiw/Nm.  Thus,  the  estimate  is  highly  vari¬ 
able  for  speakers  with  few  samples,  but  will  approach  the  un¬ 
derlying  speaker  component,  yrn ,  as  the  number  of  samples  in¬ 
creases.  Additionally,  if  a  speaker  provides  samples  from  simi¬ 
lar  channels,  so  that  the  channel  components  are  not  zero-mean, 


3.3.  Bayesian  Adaptation 

In  [5],  the  authors  propose  Bayesian  adaptation  of  hyperparam¬ 
eters,  where  speaker  components  are  assumed  to  be  latent  ran¬ 
dom  variables,  as  in  (2).  The  joint  posterior  distribution  of  the 
set  {/x,  £  a,  £  to,  3^}  is  approximated  by  a  factorized  form  using 
variational  Bayes  (VB)  [9]: 


p(fi,  £a,  Vw,y  \X)  «  q  (y)  q  (/x,  Ea)q(Sw)  ■  (12) 

MAP  parameter  estimates  are  then  found  for  each  factoring  dis¬ 
tribution,  and  used  during  scoring.  According  to  variational 
Bayes,  the  optimal  factoring  distributions  are  found  iteratively. 
The  update  equation  for  the  distribution  of  speaker  components 
is  given  by  [9]: 

log  Q  {y)  =  ^,sa.s„{logp  (X,  y,  /x,  Xa,  £„)}  +  const, 

(13) 

with  analogous  expressions  for  the  other  hidden  variables.  Cen¬ 
tral  to  the  variational  Bayes  approach  is  the  total  data  log- 
likelihood,  which  for  our  statistical  framework  is  given  by: 

log p(X,y,n,  £„,£„)  =  log p(X  | y,  £„)  (14) 

+  iogp(;y  l/x-Xa) 

+  log  p  (/u  £<*)  +  logp  (£«,) , 


where: 


M  Nm 

\ogp(X\y,  Xu, )  =  ^2  ^Jl/(xm„;ym,£„)  .  (15) 

m= 1  n=l 

Applying  the  variational  Bayes  method,  and  using  the  statistical 
framework  described  in  Sec.  2,  the  optimal  factors  are  deter- 


ruined  via  the  iterative  equations  (see  [5]  for  details): 

M  =«y+  (1  -  a)  /x°“\ 


(16) 


£a  =a 


1 

M 


5^(ymy!)-yyT  +(i-a)S°“*  (17) 


+  «(!-«)  (y  —  /xout)  (y  —  n°ut)T  . 


^mn-^-rnn 


where: 


-  xmn(y„,)T  +  (y„yl>j  +  (1  -  a)  £ 

1  M 

y  =  M  E< y™>> 


(ym>  =£a  (  £a  +  —  £ 


-1  1  "" 


IV„ 


+  Xty  (iVmSa  +  Sio)  1  /X, 


(18) 


out 

W  5 


(19) 

(20) 


(y™yl)  =S«,  (iVm£a  +  £„,)  1  £a  +  (y™.)(ym)T-  (21) 


As  previously  mentioned,  Bayesian  adaptation  models  speaker 
means  as  posterior  distributions,  taking  into  account  the  uncer¬ 
tainty  resulting  front  parameter  estimation  with  finite  data,  and 
is  therefore  less  sensitive  to  in-dontain  data  sets  with  few  sam¬ 
ples  per  speaker. 

However,  Bayesian  adaptation  may  still  suffer  from  in- 
dontain  data  with  low  channel  diversity.  If  (1)  is  substituted 
into  (20),  the  mean  of  the  posterior  distribution  of  ym  is: 

(y m)  ^£a  4-  —  E^J  ym 
+  Sul  (Nm^a  +  [A 

(  l  v1  i  Nm 

+  £a  |  £a  +  — — J  — —  ^  *  cmn.  (22) 

\  J  2Vm  _ 


The  expected  value  of  y„,  in  (22)  includes  an  additive  noise 
term  due  only  to  channel  effects.  If  channel  components  are 
zero-mean,  which  can  be  expected  in  the  case  of  high  channel- 
diversity,  the  noise  term  will  also  be  zero-ntean.  Conversely, 
for  low  channel  diversity,  the  noise  term  will  not  be  zero-mean, 
introducing  distortion  to  (ym).  Furthermore,  the  posterior  co- 
variance  of  ym,  given  by  (Nm'Sa  +  £u,)_1  £a,  shrinks  as 
Nm  increases,  causing  the  model  to  become  increasingly  confi¬ 
dent  in  this  inaccurate  estimate. 


4.  Unsupervised  Bayesian  Adaptation 

In  the  case  of  low  channel  diversity,  speaker  labels  can  be  ad¬ 
justed  to  keep  Bayesian  adaptation  from  becoming  overly  con¬ 
fident.  For  example,  sets  of  samples  from  individual  speakers 
can  be  limited  so  that  Nm  does  not  exceed  a  certain  value.  In 
the  extreme  case,  all  in-domain  data  can  be  assumed  to  be  in¬ 
dependent,  implying  that  each  sample  is  provided  by  a  unique 
speaker.  Note  that  this  assumption  eliminates  the  requirement 
for  speaker  labels  for  the  in-domain  data  set. 

If  data  samples  in  A"  are  assumed  independent,  implying 
that  each  speaker  contributed  a  single  sample,  the  conditional 
likelihood  from  (15)  reduces  to: 

Nt 

\ogp(X\y,Y,w)  =  (xn;yn,STO) .  (23) 

n=  1 


where  subscripts  can  be  changed  to  omit  speaker  label,  and  Nt 
denotes  the  number  of  samples  in  X .  If  (23)  is  substituted  into 
(14).  the  VB  solution  from  (16)-(21)  becomes: 

M=«y  +  (1  (24) 

1  Nt  \ 

^(ynyl)  —  yyT )  +  (i.  —  a)  s““f  (25) 

n= 1  / 

+  a  (1  —  a)  (y-/0  (y  -  n°utf  , 

N'j' 

=-^  ^  (x„x'  -  (yn)x^  (26) 

n=  1 

-  x„(yn)T  +  (y„yj))  +  (1  -  a)  £”“*, 

where: 

NT 

(27) 

n=  1 

(yn)  ~Sa  (£„  -f  Sm)  xn  (28) 

+  (£„  +  Sm)  fj,, 

(y«yn>  =  (s^1  +  E.j1)  1  +  (y»)(yn)T.  (29) 

It  may  provide  insight  to  substitute  (27)-(29)  into  (24)-(26)  to 
obtain  update  equations  for  £a  and  £,„.  If  £t  denotes  the 
global  covariance  of  X  (and  the  sample  means  of  X  and  the 
out-of-domain  sets  are  assumed  to  be  zero  for  illustrative  pur¬ 
poses),  the  update  equations  become: 

£a  4=  a  (Ha£tH^  +  (S’1  +  E"1)-1) 

+  (1  -  a) 

4=  a  (lU£tH Z  +  (S"1  +  S"1)'1) 

+  (l-a)£r, 

where  Ha=£a  (£a  +  and  HU,=EU,  (£0  +  £„,)_1. 

The  proposed  adaptation  technique  can  then  be  interpreted  as  it¬ 
eratively  designing  Wiener  filters  to  extract  the  across-class  and 
within-class  variabilities  from  the  total  variability  of  the  data 
set.  Furthermore,  the  out-of-domain  hyperparameter  estimates 
serve  both  to  provide  initial  estimates  of  these  filters,  and  to 
constrain  the  across-class  and  within-class  variabilities  during 
optimization.  It  should  be  noted  that  the  adaptation  coefficient 
can  not  be  set  too  close  to  1,  since  the  out-of-domain  estimates 
are  required  to  constrain  the  optimization. 

It  is  of  interest  to  note  that  the  update  equations  for  £a  and 
£,„  show  strong  similarity  to  iterative  Wiener  filters  (IWFs). 
Specifically,  in  the  trivial  case  of  a=l,  the  update  equations  are 
each  identical  to  the  IWF  with  an  additive  correction  factor  pro¬ 
posed  in  [10],  where  the  term  (S^1  +  S^1)  ensures  that  the 
property  Et=Ea+Eu)  holds  at  the  stationary  point. 

5.  Experimental  Results 

5.1.  System  Description 

This  section  presents  experimental  results  for  domain  adapta¬ 
tion  with  inadequate  in-domain  data.  The  baseline  system  uses 
600-dimensional  i-vectors,  with  global  centering  and  whitening 
applied  prior  to  length  normalization  [11],  The  system  uses  40- 
dimensional  cepstral  features  including  deltas,  with  mean  and 
variance  normalization.  All  speaker  verification  results  are  pre¬ 
sented  for  PLDA  scoring,  in  terms  of  equal  error  rate  (EER)  and 


(30) 

(31) 


minimum  decision  cost  function  (mindcf),  and  pooled  across 
gender.  For  all  adaptation  methods,  an  adaptation  coefficient  of 
a=0.5  was  used.  As  baseline  methods,  we  use  MAP  adaptation 
with  point  estimates  [3]  and  supervised  Bayesian  adaptation  [5]. 

5.2.  Domain  Adaptation  with  Few  Samples  Per  Speaker 

Table  1  provides  speaker  verification  results  for  a  variety  of  do¬ 
main  adaptation  strategies,  when  applied  to  the  DAC13  [8],  The 
out-of-domain  system  was  trained  on  the  Switchboard-I  and 
Switchboard-II  corpora,  and  the  in-domain  data  included  tele¬ 
phone  calls  from  SRE04-SRE08.  The  in-domain  data  was  re¬ 
duced  to  contain  only  2  randomly  drawn  samples  per  speaker. 
Five  random  draws  of  the  in-domain  set  were  tested,  and  results 
represent  the  average  of  these.  In  Table  1,  none  refers  to  the 
unadapted  out-of-domain  system,  and  whitening  refers  to  using 
in-domain  data  solely  to  adapt  i-vector  whitening  and  center¬ 
ing  prior  to  length  normalization.  In  Table  1,  MAP  adaptation 
with  point  estimates  [3]  suffers  degradation  since  speaker  com¬ 
ponents  estimated  with  (6)  are  highly  variable  for  small  values 
of  Nm,  as  discussed  in  Sec.  3.2.  However,  Bayesian  adap¬ 
tation  from  [5]  and  the  proposed  unsupervised  Bayesian  adap¬ 
tation  perform  well  since  they  take  into  account  the  uncertainty 
present  when  estimating  speaker  components  from  limited  data. 

Table  1:  Speaker  verification  results  on  the  DAC13  task,  using 
an  in-domain  set  with  2  samples  per  speaker 


adaptation  method 

EER  (%) 

mindcf 

none 

6.41 

0.471 

whitening 

5.25 

0.412 

MAP  with  point  estimates  [3] 

3.18 

0.296 

Supervised  Bayesian  [5] 

2.74 

0.271 

Unsupervised  Bayesian 

2.92 

0.270 

5.3.  Domain  Adaptation  with  Low  Channel  Diversity 

Realistic  in-domain  data  sets  may  also  be  inadequate  due  to  lim¬ 
ited  channel  diversity.  Table  2  provides  results  for  the  DAC13 
task  when  the  in-domain  set  only  includes  samples  from  the 
dominant  phone  number  for  each  speaker,  so  that  the  in-domain 
set  consisted  of  ~24k  samples  from  ~3800  speakers.  In  this 
case,  the  baseline  methods  suffer  degradation  since  both  will 
produce  distorted  speaker  component  estimates  when  channel 
components  have  non-zero  mean,  as  discussed  in  Sec.  3.3.  The 
proposed  method,  however,  does  not  rely  on  channel  diversity 
per  speaker,  and  suffers  no  such  degradation. 

Table  2:  Speaker  verification  results  on  the  DAC13  task,  using 
an  in-domain  set  with  a  single  phone  number  from  each  speaker 


adaptation  method 

EER  (%) 

mindcf 

none 

6.41 

0.471 

whitening 

5.17 

0.413 

MAP  with  point  estimates  [3] 

5.51 

0.424 

Supervised  Bayesian  [5] 

5.47 

0.422 

Unsupervised  Bayesian 

3.07 

0.278 

5.4.  Domain  Adaptation  with  Resource-rich  Data 

Table  3  provides  verification  results  for  the  DAC13  task  for  the 
full  in-domain  set,  consisting  of  ~36k  samples  from  ~3800 


speakers,  with  an  average  of  9.6  samples  per  speaker  and  2.8 
phone  numbers  per  speaker.  This  can  be  considered  a  resource- 
rich  adaptation  set,  and  the  systems  from  [3]  and  [5]  can  both 
be  expected  to  perform  well.  However,  we  wish  to  verify  that 
the  proposed  method  remains  competitive,  even  though  it  is  not 
able  to  leverage  the  rich  information  provided  by  speaker  labels. 
It  can  be  observed  in  Table  3  that  the  baseline  methods  provide 
excellent  results,  and  that  the  proposed  technique  suffers  only  a 
slight  degradation. 

Table  3:  Speaker  verification  results  on  the  DAC13  task,  using 
the  full  in-domain  set 


adaptation  method 

EER  (%) 

mindcf 

none 

6.41 

0.471 

whitening 

5.20 

0.413 

MAP  with  point  estimates  [3] 

2.54 

0.254 

Supervised  Bayesian  [5] 

2.40 

0.245 

Unsupervised  Bayesian 

2.96 

0.269 

5.5.  Domain  Adaptation  on  the  SRE16 

To  verify  that  the  observations  made  from  Tables  1-3  general¬ 
ize  to  other  data  sets,  we  performed  experiments  on  the  2016 
Speaker  Recognition  Evaluation  (SRE16)  [12]  fixed  task.  In 
this  case,  the  out-of-domain  system  was  was  trained  using  data 
from  SRE04-SRE12.  The  SRE16  included  trials  in  two  non- 
English  languages  and  from  unseen  channels.  An  inadequate 
in-domain  set  was  provided,  which  consisted  of  2272  samples 
from  1164  speakers,  and  spanned  both  unseen  languages.  An 
adaptation  coefficient  of  o?=0.1  was  used.  In  the  set,  speakers 
provided  an  average  of  1.9  samples  from  1.1  different  phone 
numbers,  making  this  data  deficient  with  respect  to  both  sam¬ 
ples  per  speaker  and  channel  diversity.  It  can  be  observed  in 
Table  4  that  the  baseline  technique  from  [3]  provides  benefit  in 
terms  of  mindcf,  and  supervised  Bayesian  adaptation  [5]  pro¬ 
vides  improvements  in  terms  of  both  EER  and  mindcf.  The  un¬ 
supervised  Bayesian  approach  provides  results  which  are  com¬ 
petitive  with  [5],  even  though  speaker  labels  are  not  required  for 
the  in-domain  data. 

Table  4:  Speaker  Verification  Results  on  the  SRE16  Fixed  Task 


adaptation  method 

EER  (%) 

mindcf 

none 

19.35 

0.999 

whitening 

16.70 

0.972 

MAP  with  point  estimates  [3] 

18.26 

0.775 

Supervised  Bayesian  [5] 

15.49 

0.776 

Unsupervised  Bayesian 

15.32 

0.723 

6.  Conclusions 

We  have  proposed  a  technique  for  unsupervised  Bayesian  do¬ 
main  adaptation  for  i-vector  speaker  verification.  The  method 
shows  improved  effectiveness  for  inadequate  in-domain  data. 
Specifically,  the  method  performs  well  even  when  in-domain 
sets  include  speakers  with  few  samples  or  low  channel  diversity. 
The  proposed  technique  provides  competitive  results  on  a  range 
of  domain  adaptation  experiments  with  inadequate  data,  even 
when  compared  to  supervised  systems  which  require  speaker 
labels  for  the  in-domain  set. 
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